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(54) Speech recognition 

(57) A method of speech recognition includes rec- 
ognizing a first utterance, recognizing a second utter- 
ance having information that is related to the first utter- 
ance, and determining the most probable first and sec- 
ond utterances based on stored information about valid 
relationships between possible first and second utter- 



ances. The recognized first utterance may be recog- 
nized continuously and the recognized second utter- 
ance may be recognized discretely. The determination 
of the most probable utterances may include creating a 
list of possible utterances that could be confused with a 
recognized utterance and rerecognition of a list of pos- 
sible utterances against an utterance 
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Description 



The invention relates to speech recognition. 

A speech recognition system attempts to determine, either on a continuous or discrete basis, what words were 
s intended, based on analysis of a speaker's utterances. A variety of techniques have been used to improve the accuracy 
of the recognition. 

In one aspect, the invention features a method of speech recognition including recognizing a first utterance, rec- 
ognizing a second utterance having information that is related to the first utterance, and determining the most probable 
first and second utterances based on stored information about valid relationships between possible first and second 
io utterances. 

The method may further include determining the validity of a recognized utterance and including an invalid utterance 
in a list of possible utterances for comparison with the possible utterances in the list. 

In some embodiments of the invention, the determination of the most probable utterances may include rerecognition 
of a list of possible utterances against at least one utterance. 

is Implementations of the method may further include the following features: ranking the list of possible utterances 

based upon how closely each possible utterance corresponds to the at least one utterance: creating an hypothesized 
list of possible utterances that relate to the at least one recognized utterance based on the stored information: comparing 
the ranked list of possible utterances to the hypothesized list of possible utterances for commonality between the lists: 
creating an hypothesized list of possible utterances that relate to the at least one recognized utterance based on the 

20 stored information: creating a list of possible utterances that could be confused with the at least one recognized utter- 
ance. 

In some embodiments of the invention, the recognized first utterance is recognized continuously and the recognized 
second utterance is recognized discretely. 

Implementations of this aspect of the method may further include the following features. A list is created of possible 
25 first utterances that may be confused with the recognized first utterance. An hypothesized list of possible second 
utterances is created that relate to the possible first utterances based on the stored information. The recognized second 
utterance is added to the hypothesized list of possible second utterances to create a merged list of possible second 
utterances. The merged list of possible second utterances is rerecognized against the second utterance to get a ranked 
list of possible second utterances, the ranking based upon how closely each possible second utterance in the merged 
30 list corresponds to the second utterance. The ranked list of possible second utterances is compared to the hypothesized 
list of possible second utterances for commonality between the lists, the highest ranked possible second utterance in 
the ranked list being compared first. An hypothesized list of possible first utterances is created from the second rec- 
ognized utterance based on the stored information. The recognized first utterance is added to the hypothesized list of 
possible first utterances to create a merged list of possible first utterances. The merged list of possible first utterances 
3S is rerecognized against the first utterance to get a ranked list of possible first utterances having a ranking based upon 
how closely each possible first utterance in the merged list corresponds to the first utterance. A possible first utterance 
ranked second in the ranked list of possible first utterances is evaluated by determining whether a distance parameter 
associated with the second ranked possible first utterance is within an acceptable limit. A user is advised when the 
second ranked possible first utterance is not within the acceptable limit and no commonality exists between the ranked 
40 list of possible second utterances and the hypothesized list of possible second utterances. 

In some embodiments of the invention, the first utterance is a zipstate and the second utterance is a city from a 
destination address on a package/the determination of the most probable first and second utterances resulting in the 
sorting of the package according to the packages destination address. 

In some embodiments of the invention, the first utterance is a spelled prefix including ordered symbols and the 
45 second utterance is a word. A list is created of possible prefixes that could be confused with the recognized prefix. 
Creating the list of possible prefixes includes determining, in the context of a preceding symbol or silence, a probability 
of confusing each recognized symbol in the prefix with each symbol in a list of possible symbols, of confusing each 
recognized symbol in the prefix with an addition of an extra symbol preceding the recognized symbol, and of confusing 
each recognized symbol in the prefix with the absence of a symbol. Creating the list of possible prefixes includes 
50 replacing a sequence of symbols with a single symbol. 

In some embodiments of the invention, the first utterance is a spelled word and the second utterance is a word, 
the determination of the most probable first and second utterances resulting in recognizing the spelled word. 

In another aspect, the invention features a method of generating a choice list from a continuously recognized 
utterance including continuously recognizing a spoken utterance, consulting stored information to determine the prob- 
ss ability of confusing possible utterances in the stored information with the recognized utterance, and producing a list of 
possible utterances from the stored information that could be confused with the recognized utterance. 

The method may include rerecognizing the utterance against a merged list of the list of possible utterances and 
the recognized utterance to create a ranked list of possible utterances having a ranking based upon how closely each 
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utterance in the merged list corresponds to the spoken utterance. 

In another aspect, the invention features a method of recognizing ambiguous inputs including recognizing a first 
ambiguous input, recognizing a second ambiguous input having information that is related to the first ambiguous input, 
and determining the most probable first and second ambiguous inputs based on stored information about valid rela- 
5 tionships between possible first and second ambiguous inputs. 

In another aspect, the invention features a method of training a speech recognizer including prompting a user to 
make a first utterance including symbols, recognizing the symbols, and calculating the probability of confusing each 
recognized symbol with the prompted symbol. The probabilities are calculated within the context of the preceding 
symbol or silence. 

to in another aspect, the invention features a method of displaying word choices during speech recognition including 

recognizing an uttered word, recognizing a spelling of a prefix of the word, whereby symbols are used to spell the 
prefix, and displaying a list of word choices on a screen for selection, a top choice on the list corresponding to a highest 
ranked choice. The symbols are letters, digits, and punctuation. 

Advantages of the invention may include one or more of the following. Recognition accuracy is improved consid- 

is erably over prior speech recognition systems. Any two utterances containing related data can be recognized with 
improved accuracy. The invention can hypothesize a list of choices from a continuously recognized utterance. The 
invention can determine if an utterance contains invalid data. The invention can improve the ability of a recognizer to 
determine a word that has been spelled even though the individual tetters ("b° versus "e") may be difficult to distinguish. 
Other advantages and features will become apparent from the following description and from the claims. 

20 

Description 

Figure 1 is a schematic diagram of a package sorting system. 

Figure 2 is a functional block diagram of a city, state and zip code determining system. 
2S Figure 3 is a functional block diagram of a confusability algorithm. 

Figures 4-4b are flow diagrams showing an implementation of the address determining system. 

Figure 5 is a confusability matrix of probabilities. 

Figure 6 is an example of calculated confusion values. 

Figure 7 is a listing of zip codes and their related confusion values. 
so Figure 8 is a functional block diagram of a system for determining a word from the spelling of the word. 

Figure 9 is a functional block diagram of a recognition system. 

Figure 10 is a functional block diagram of a system for determining a word from the spelling of the word. Figure 

11 is a functional block diagram of a confusability algorithm. 

Figures 12-1 2b are flow diagrams showing an implementation of the word determining system. 
35 Figures 1 3-1 3b are confusability matrices of probabilities. 

Figure 14 is an example of calculated confusion values. 
Figure 15 is a listing of words and their related confusion values. 

Figures 1 6-1 6a are forward and backward diagrams representing Baum Welsh training. 

Figure 17 is a diagrammatic representation of a user interface. 
40 As seen in Fig. 1, in system 10 for sorting parcels 12, user 14 reads the address 11 on each parcel 12 and a 

* microphone 16 conveys the spoken information to a computer 18. A speech recognizer 19, such as Voice Tools™ or 
Dragon Dictate™, available from Dragon Systems. Inc., is supplemented with an associated hypothesis algorithm 21 
that determines the spoken address based on the output of the recognizer and notifies the user whether the address 
is valid. To find that an address is valid, in most instances system 10 must be able to determine the existence of a valid 
city, state and zip code combination (e.g., Somerville, MA 02143) from the speech recognized address: for states with 
one shipping destination for ail cities only a valid state and zip code is required (e.g., Montana 59100). 

For a valid address, the zip code information 23 is sent to a second computer 25 which routes the package along 
a "tree" of sorting conveyors 20, e.g.. by tracking the package using electric eyes or machine readable labels. Package 

12 is transported along a predetermined path 22 on conveyors 20 leading to a truck 24 going to the package's desti- 
so nation. 

For an invalid address, the user may repeat the address or reject the package as having an invalid address. 

Referring to Fig. 2. the first step in the sorting process is that the microphone receives the utterances of the "zip- 
state" 30 and "city" 32 on the package. The zipstate is the zip code and state spoken as one word, e.g., 
"02l43Massachusetts". Using continuous speech recognition, zipstate recognizer 34 determines the zipstate 38 it 
55 believes corresponds to the spoken zip code and state. Each number of the zip code is recognized against a vocabulary 
of numbers zero through nine and the state is recognized against a vocabulary of states. If the recognizer is not able 
to recognize five digits and a state, the address is rejected as invalid and an invalid address signal 35 is sent to a sound 
generator 36 in the headphones which produces a beep 37 signaling the user that the address is invalid. 
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If the recognizer is able to recognize five digits and a state, a confusability algorithm 39 (described below) is run 
to create a list of all hypothesized cities 40 that correspond to the recognized zipstate 38 and to other zipstates that 
the recognizer would be likely to confuse with the recognized zipstate (these other zipstates are limited to valid zip 
codes within the recognized state). To this list of hypothesized cities 40 is added a list of cities 42 determined by a 

s discrete city recognizer 41 . The uttered city 32 is recognized against a vocabulary of valid cities 43 that correspond to 
the recognized state to produce the list 42 of possible cities. 

The list of hypothesized cities 40 and the list of cities 42 are merged 44 to create a city list 45. City list 45 is then 
used as a city vocabulary against which the city utterance is rerecognized 46 and a ranked list of cities 47 is generated 
based on a determination by the recognizer of how closely it believes each city in the merged city list 45 matches the 

io uttered city 32. The top city in the ranked list of cities 47 : i.e.. the best recognized city, is then compared 48 to the 
hypothesized cities 40. If the top city is in the hypothesized city list, that city is deemed the correct destination, and the 
package is routed to that city. If not, the next city in the ranked list is compared to the hypothesized city, and so on: the 
first match determines the destination city 49 which is delivered to the second computer 25. 

If no commonality exits between the ranked list of cities 47 and the hypothesized cities 40, zipstates 51 are hy- 

ts pothesized 50 from the ranked list of cities 47 by consulting a database 52 of zip codes and cities. The originally 
recognized zipstate 38 is merged 53 with the hypothesized zipstates 51 and the resulting zipstate list 54 is used as a 
zipstate vocabulary against which the zipstate utterance 30 is discretely rerecognized 55 to generate a ranked list of 
zipstates 56 including a distance parameter associated with each zipstate signifying how poorly each zipstate was 
recognized as compared to the highest ranked zipstate. The highest ranked zipstate will, naturally, be the originally 

20 recognized zipstate 38. Since it is already known that no hypothesized city from the original zipstate 38 corresponds 
to a recognized city (if it had. the city would have be in both the ranked list of cities 47 and the hypothesized cities 40 
and this step in the algorithm would not have been reached), it is the second zipstate on the ranked list that is evaluated 
57. The top ranked zipstate is maintained in ranked list 56 for comparison with the other zipstates in the list to determine 
the distance parameter. If the distance parameter for the second zipstate is beyond an acceptable limit, e.g.. if 

25 8log 2 conf usion value (described below) of the second zipstate is more than 1 00 off that of the first zipstate. the package 
is rejected as having an invalid address and an invalid address signal is sent to the user. If the distance parameter is 
within the acceptable limit, the package is deemed to have a valid address and the information if delivered to second 
computer 25 for controlling shipment. 

As seen in Fig. 3. in the confusability algorithm 39, the first step is for a zip code hypothesizer 56 to hypothesize 

30 possible valid zip codes 59 for the recognized state of zipstate 38 by consulting a database 60 of zip codes and states. 
The probability that the recognizer would confuse the recognized zip code with each of the hypothesized valid zip 
codes 59 is determined 61 . The top forty zip codes, i.e.. those with the highest probabilities of being confused with the 
recognized zip code, form a list of zipstates 62 that are then rerecognized 63 against the uttered zipstate 30 in a discrete 
recognition creating a ranked list 64 of the top ten zipstates. It is from this ranked list 64 of zipstates that the list of 

35 hypothesized cities 40 is created by a city hypothesizer 65 which consults a database 66 of zip codes and corresponding 
cities. 

The discrete recognition against the list of zipstates 62 has a higher recognition accuracy rate than the original 
continuous recognition of the uttered zipstate. Discrete recognition of the original uttered zipstate is not done because 
it would require recognition against a vocabulary of 5X10*6 zipstates. 

40 An overview of the code is shown in Figs. 4-4b. The user utters the zipstate and city 70. The zipstate is continuously 

recognized and if the recognized zipstate 38 does not contain good data, i.e., five digits and a state, it is rejected as 
a bad address 72 and the user is informed that the address was rejected. 

If the recognized zipstate contains good data 74. the confusability algorithm 39 is employed to create zipstate list 
62. Referring to Figs. 4a and 4b, a propagation subroutine 78 is entered to hypothesize the zip codes which are to 

45 populate zipstate list 62. 

As seen in Fig. 4b, in the propagation routine 78, a variable n corresponding to the digit place in the zip code (first, 
second, third, fourth, or fifth number of the zip code) is incremented 81 each time the propagate routine is entered at 
arrow 80. An index number i corresponding to the numbers zero through nine is incremented 82 after each loop 83 of 
the propagation routine. At the start of each loop 84, the index number is put into the nth digit place. The built code is 

so checked for validity 85 against the database of valid zip codes for the recognized state. If the index number in the nth 
digit place is not valid, the index number is incremented and the next number is tried. If the index number in the nth 
digit place is valid, a confusion value 88 corresponding to how likely it would be for the recognizer to confuse the 
number it recognized in the nth digit place with the index number in that digit place is calculated (described further 
below). The code is placed in a cue 90 and the loop is repeated incrementing the index number each time until i=9 has 

ss been checked 91 for the nth digit place. The codes are placed in queue 90 in descending order of probability 

When i=9, the propagation routine is exited. In Fig. 4a, if there are less than forty zip codes 93 in a top list 92, the 
highest probability code 94 in queue 90 is considered. If code 94 has less than five digits 95. the propagation routine 
is reentered with this code. Variable n is incremented and the process is repeated until there are 5 digits in the highest 
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probability code. This zip code is then moved to the top list 92. The next highest probability code 94 is then taken from 
queue 90 for consideration. If there are five digits in this zip code, the zip code is moved to the top list. This is repeated 
until the code at the top of queue 90 has less than five digits. This partial zip code is then propagated. The confusability 
algorithm is repeated until there are forty zip codes in top list 92. 

5 Referring again to Fig. 4. the list of zipstates 62 corresponds to the forty zip codes from top list 92 combined with 

the recognized state. The zipstates 62 are then rerecognized by a discrete recognition 95' against the original utterance. 
The recognizer returns a distance parameter of how well the zipstates were recognized 96. Beyond a limit the zipstates 
are rejected. At most the top ten recognized zipstates form the ranked list of zipstates 64. A database of cities corre- 
sponding to zip codes is then consulted to create the list of hypothesized cities 40. 

io At this point in the code (though the following check could be done at any time after recognizing the original zipstate 

utterance) the recognized zip code is checked against the recognized state. If the recognized zip code is a valid zip 
code for the recognized state 97, and if there is only one shipping destination for the recognized state 98. the package 
is shipped to the recognized state. If either the recognized zip code is not valid for the recognized state or there is more 
than one shipping destination for the recognized state, the city is recognized 41 by discrete recognition forming the list 

is of cities 42. From the list of cities 42 and the hypothesized cities 40, the ranked list of cities 47 is created as described 
above with reference to Fig. 2. The algorithm proceeds as described above. 

Referring to Fig. 5 : the confusion value corresponding to how likely it would be for the recognizer to confuse the 
recognized number in the nth digit place with the index number in the nth digit place is calculated from a confusability 
matrix of probabilities. The rows of the matrix correspond to the recognized number and the columns of the matrix 

20 correspond to the index number. For example, the probability of confusing the recognized number 4 with the index 
number 2 is 0.021 6. As the number of digit places increases, the probability is calculated by multiplying the probabilities 
for each digit place. For example, the probability of confusing the recognized number 44 with 22 is 0.0216 x 0.0216 
= .00047: the probability of confusing 442 with 223 is .00047 x .1888. 

The confusability matrix is experimentally determined by having predetermined numbers read off to the speech 

25 recognizer and calculating how often the number is incorrectly recognized and what number it is incorrectly recognized 
as. The confusability matrix may be customized for an individual user. The confusability matrix is shown in Fig. 5 as a 
matrix of probabilities for ease of description: the code utilizes a matrix of natural logs of the probability. 

EXAMPLE 

30 

Utterance: 02143Massachusetts Somerville 
Zipstate recognized: 03112MA 
Confusabiiity algorithm: 

35 Referring to Fig. 6 in which the calculation of the confusion values is shown for each digit place, in the first digit 

place, n=1 T 0 is the only valid number for Massachusetts zip codes. Queue 90 becomes: 



40 



n 


i 


zip code 


confusion value 


1 


0 


0— - 


1 



There are fewer than 40 zip codes in top list 92: 0— is pulled from queue 90 as the highest probability code in 
the queue: there are less than five digits in the pulled code: the propagation routine is reentered. 
In the second digit place, n=2, 1 and 2 are the only valid digits. Queue 90 becomes: 



n 


i 


zip code 


confusion value 


2 


2 


02— 


0.0497 


2 


1 


01 — 


0.0001 



02— is pulled from queue 90 as the highest probability code and the propagation routine is reentered. 
In the third digit place. n=3 ; 0 to 7 are valid digits. Queue 90 becomes: 



n 




zip code 


confusion value 


3 


1 


021- 


0.04970 


3 


4 


024- 


0.00456 


3 


5 


025- 


0.00365 
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(continued) 



n 


i 


zip code 


confusion value 


3 


7 


027- 


0.00209 


3 


0 


020- 


0.00077 


2 


1 


01 — 


0.00010 



For i= 2, 3 and 6 the confusion value is 0.0497 * 0.0001 . which is so low that these codes get dropped from the 
queue. 

021- is pulled from queue 90 as the highest probability code and the propagation routine is reentered. 
In the forth digit place, n=4, 0 to 9 are valid digits. Queue 90 becomes: 



20 



n 




zip code 


confusion value 


4 


1 


0211- 


0.04970 


4 


4 


0214- 


0.00456 


3 


4 


024- 


0.00456 


4 


5 


0215- 


0.00365 


3 


5 


025- 


0.00365 


4 


7 


0217- 


0.00209 


3 


7 


027- 


0.00209 


4 


9 


0219- 


0.00150 


4 


0 


0210- 


0.00077 


3 


0 


020- 


0.00077 


2 


1 


01 — 


0.00010 



0211- is pulled from queue 90 as the highest probability code and the propagation routine is reentered. 
In the fifth digit place. n=5 : 0 to 9 are valid digits. Queue 90 becomes: 

30 



40 



so 



n 


i 


zip code 


confusion value 


5 


2 


02112 


0.04970 


5 


3 


02113 


0.00940 


5 


6 


02116 


0.00600 


4 


4 


0214- 


0.00456 


3 


4 


024- 


0.00456 


5 


0 


02110 


0.00365 


4 


5 


0215- 


0.00365 


3 


5 


025- 


0.00365 


5 


7 


02117 * 


0.00260 


5 


8 


02118 


0.00209 


4 


7 


0217- 


0.00209 


3 


7 


027-- 


0.00209 


5 


4 


02114 


0.00200 


5 


9 


02119 


0.00150 


4 


9 


0219- 


0.00150 


5 


5 


02115 


0.00140 


5 


1 


02111 


0.00100 


4 


0 


0210- 


0.00077 


3 


0 


020-- 


0.00077 


2 


1 


01 — 


0.00010 



55 

There are fewer than 40 zip codes in top list 92: 02112 is pulled from queue 90 as the highest probability code: 
02112 has five digits so it is moved to top list 92: 02113 is pulled from queue 90 and moved to the top list: 02116 is 
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pulled from queue 90 and moved to the top list: 0214- is pulled from queue 90 and propagated. 
Again, in the fifth digit piace ; n=5. 0 to 9 are valid digits. Queue 90 becomes: 



n 


1 


zio code 


confusion value 


5 


2 


02142 


0- 00456 


3 


4 


024 — 


0* 00456 


5 


0 


02110 


0. 00365 


4 


5 


0215- 


0. 00365 


3 


5 


025 


0.00365 


5 


7 


02117 


0. 00260 


5 


8 


02118 


0.00209 


4 


7 


0217- 


0. 00209 


3 


7 


027 — 


0. 00209 


5 


4 


02114 


0. 00200 


5 


9 


02119 


0.00150 


4 


9 


0219 


0. 00150 


5 


5 


02115 


0. 00140 



5 


1 


02111 


0.00100 


5 


3 


02143 


0. 00086 


4 


0 


0210- 


0.00077 


3 


0 


020 — 


0. 00077 


5 


6 


02146 


0.00055 


5 


0 


02140 


0.00033 


5 


7 


02147 


0.00020 


5 


8 


02148 


0.00019 


5 


4 


02144 


0.00018 


5 


9 


02149 


0.00014 


5 


5 


02145 


0.00013 


2 


1 


01 


0.00010 



02142 is moved to the top list and 024- is pulled from queue 90 and propagated. The process is continued until 
there are forty zip codes in top list 92. The top 30 zip codes are shown in Fig. 7. 

Discrete recognition of the top forty zipstates yields four zipstates in ranked order corresponding cities 40 are 
hypothesized: 



zipstates 


hypothesized cities 40 


02143MA 
02147MA 
02112MA 
02113MA 


Somerville 
Brookline Village 
Essex Station 
Hanover 



Cities 42 are discretely recognized from the utterance: 

Sagamore 

Somerville 

Sudbury 

Salisbury 

Sheldonville 

City list 44 is formed: 
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Somerville 
Brookline Village 
Essex Station 
Hanover 
5 Sudbury 
Sagamore 
Salisbury 
Sheldonville 

10 Ranked city list 47 is formed from discrete precognition of the city: 

Sagamore 
Somerville 
Sudbury 
is Salisbury 

Sheldonville 

Comparison of hypothesized cities 40 to ranked list 47 shows Somerville as the common city. The package is shipped 
to Somerville. 

20 Factory tests of the invention for sorting packages have resulted in a package rejection rate for invalid address of 

about 2% and a misshipping rate of about 0.1%: whereas, factory tests of the invention without employing the confus- 
ability algorithm, i.e., hypothesizing cities only from the one recognized zipstate, resulted in a package rejection rate 
of about 5-8% and a misshipping rate of about 0.1%. 

Referring to Fig. 8. a similar hypothesis algorithm can be used to recognize spoken letters when spelling a word. 

25 The word 100 is first uttered and recognized 101 by discrete recognition. If the recognizer gets the word wrong, the 
user so indicates to the recognizer and then utters the spelling of the word 1 02. The spoken string of letters is recognized 
104 by continuous recognition. The conf usability algorithm 106 uses the probability that a recognized letter 105 would 
be confused with a letter of the alphabet. As in the zip code example, a ranked list of letter combinations 107 is formed 
(similar to ranked zipstate list 64). In creating the ranked list of letter combinations, a dictionary database 108 is con- 

30 suited to validate that each letter combination is capable of forming a valid word (similar to checking for valid zip codes). 
The top forty valid letter combinations are then discretely rerecognized against the spoken string of letters to form 
ranked list 107. A list of words 110 is then hypothesized from the ranked list of letter combinations (similar to hypoth- 
esized cities 40). 

The hypothesized words 110 and a list of words 111 formed by discrete recognition of the uttered word 100 are 
35 merged 1 1 2 to form a word list 1 1 3. Word list 1 1 3 is discretely rerecognized 1 1 4 against the spoken word and a ranked 
list of words 115 is formed. Ranked list of words 115 is compared 116 to hypothesized words 110, the top ranked word 
being compared first. The first ranked word that shows up in the hypothesized words 110 is chosen 117. 

If the chosen word is incorrect the user can reject it and the next ranked word that shows up in the hypothesized 
words 110 is chosen. This process can be continued until the correct word is chosen or the ranked list is exhausted. 
-to Alternatively word comparer 116 can create a choice list 117' containing all words in common between ranked list of 
words 1 1 5 and hypothesized words 1 1 0 can be displayed to the user. If the correct word is contained within the choice 
list, the user can select the word either using the keyboard or with a spoken command. For example, if the misrecognized 
uttered word 100 is "in", a choice list may appear as: 
in: 

45 

in 
an 
ink 
imp 

50 and 
amp 

with the first ranked word appearing above the list and within the list. If the user then continues speaking without 
choosing from the choice list : the first ranked word is automatically chosen. 
55 if the ranked list is exhausted without finding a common word or the correct word, possible spellings 119 are 

hypothesized 118 from the ranked list of words 115. The recognized spelling 104 is merged 120 with the hypothesized 
spellings 119 and the resulting spelling list 121 is discretely rerecognized 122 against the original spelling utterance 
102 to form a ranked list of spellings 123. The second spelling on the ranked list is evaluated 124. If the distance 
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parameter returned by the recognizer is beyond the acceptable limit, the spelling is rejected, if it is within the acceptable 
limit, that word is chosen. If the user rejects the spelling, the next spelling on the list can be considered. Alternatively 
as described above, a choice list 123' containing all but the first ranked spelling in the ranked list of spellings 123 can 
be displayed to the user for selection. 

5 Referring to Fig. 10, a particular embodiment of a hypothesis algorithm for recognizing spelled word prefixes is 

shown. Symbols which may be included in the prefix are generally letters, numbers, and punctuation (e.g.. hyphen, 
period, comma, brackets). The word 232 is first uttered and recognized 241 by discrete recognition. If word 232 is 
incorrectly recognized, the user utters the spelling of the prefix of the word 230 (generally 3 to 5 symbols, the prefix 
may be part or all of the word). The spelled prefix is recognized 234 by continuous recognition. The confusabiiiiy 

io algorithm 239 uses the probability that the symbols in recognized prefix 238 would be confused with other symbols to 
hypothesize a ranked list of prefixes 264. A distance parameter associated with each prefix signifies how poorly each 
prefix was recognized as compared to the highest ranked prefix. A top ranked prefix 242 determined by discrete rec- 
ognition of the uttered word 232 is merged 244 with prefix list 264 to form merged prefix list 245. Word hypothesizer 
265 hypothesizes a word list 240 by consulting a database of prefixes and corresponding words 266. 

is Word list 240 is discretely rerecognized 246 against the uttered word 232 to form a ranked list of words 247 including 

a distance parameter associated with each word signifying how poorly each word was recognized as compared to the 
highest ranked word. The distance parameters associated with prefixes 264 are then added 248 to the distance pa- 
rameters of words 247 that include corresponding prefixes and ranked list 247 is reranked 249 according to the new 
distance parameters. This aids in the recognition of homophones because it gives a higher ranking to the homophone 

20 having a higher ranked spelled prefix. Reranked word list 249 is displayed to the user as a choice list. If the correct 
word is contained within the choice list, the user can select the word either using the keyboard or with a spoken com- 
mand. 

As seen in Fig. 11 , in the conf usability algorithm 239 : a prefix hypothesizer 258 hypothesizes a list of prefixes 262 
based on the probability that recognizer 234 would confuse recognized prefix 238 with other word prefixes. A database 

25 260 of valid prefixes or words is used as a filter to form a list of valid prefixes 262. Database 260 can be. e.g., a list of . 
valid prefixes, a dictionary, or an n-letter language model. The top on^-hundred and fifty prefixes, i.e.. those with the 
highest probabilities of being confused with the recognized prefix 235. form the list of prefixes 262. The list of prefixes 
262 are then rerecognized 263 against the uttered prefix 230 in a discrete recognition which filters out poorly confused 
prefixes and creates a ranked list 264 of the top fifteen to twenty prefixes. It is this ranked list 264 of prefixes that is 

30 merged with the top ranked prefix from the discrete recognition of the uttered word to form the merged prefix list 245 
from which the list of hypothesized words 240 is created. 

An overview of the code is shown in Figs. 12-1 2b. The user utters a word and if the recognizer misrecognizes the 
word, the user then utters the prefix 270. The confusability algorithm 239 is then employed to create prefix list 262. 
Referring to Figs. 12a and 12b. a propagation subroutine 278 is entered to hypothesize the prefixes which are to 

35 populate prefix list 262. 

As seen in Fig. 12b, in the propagation routine 278. a variable n corresponding to the digit place in the prefix (first. 
second : third, forth, or fifth place in a prefix having five symbols) is incremented 281 each time the propagate routine 
is entered at arrow 280 (unless the prior propagation was an insertion 380, as described below, in which case n is not 
incremented 361 ). Alternatively, variable n can be incrementing at the end of each permutation loop and deletion and 

•*o kept constant at the end of each insertion loop. Propagation routine 278 includes three sub-routines, a permutation* 
routine 378. an insertion routine 478 and a deletion routine 578. An index value i (i = 0 to the number of symbols, i= 0 
to 35 in the example that follows) corresponding to the symbols letters a through z, digits zero through nine : and 
punctuation, in that order, is incremented 382. 482 after each loop 383, 483 of the sub-routines. As described more 
fully below, confusability is determined in the context of the preceding symbol in the prefix. 

At the start 384 of permutation routine 378, the index value i is put into the nth digit place and checked for validity 
385 against the database of valid prefixes 260. If the index value in the nth digit place is not valid, the index value is 
incremented and the next value is tried. If the index value in the nth digit place is valid, a confusion value 388 corre- 
sponding to how likely it would be for the recognizer to confuse the value it recognized in the nth digit place with the 
index value in that digit place is calculated (described further below). The prefix is placed in a queue 390 and the loop 

50 is repeated incrementing the index value each time until i=38 has been checked 391 for the nth digit place. The codes 
are placed in queue 390 in descending order of probability. 

When i=38 in permutation routine 378 : if the last loop of propagation routine 278 entered for this build was not a 
deletion loop 480, insertion routine 478 is entered 484, i is set to zero and put into the nth digit place (insertion loops 
do not follow deletion loops and deletion loops do not follow insertion loops because to do so would cancel out their 

55 effect). The built prefix is checkeo for block permutation 486 and for validity 485. If the index value in the nth digit place 
is valid, a confusion value 488 is calculated and the prefix is placed in queue 390. Loop 483 is repeated incrementing 
the index value each time until i=38 has been checked 491 for the nth digit place. 

When i=3S in insertion routine 478, if the last loop of propagation routine 278 entered for this build was not an 
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insertion loop 580. deletion routine 578 is entered 584. A confusion value 588 corresponding to how likely it would be 
for the recognizer to confuse the value it recognized in the nth digit place with the absence of a symbol in the nth digit 
place is calculated. The prefix is placed in queue 390. 

At the end of the propagation routine, the prefix that the propagation routine is entered with (described more fully 

5 below) is checked for inclusion of a block permutation 686. A block permutation occurs if the built prefix includes LU, 
LE. MU or ME. The letter W is substituted for these letter combinations and checked for validity 685 against the database 
of valid prefixes 260. If it is a valid prefix it is placed in queue 390. 

Propagation routine 278 is then exited. Referring to Fig. 12a. the highest probability prefix 294 in queue 390 is 
considered. If the value of n from the prior propagation of prefix 294 is less than the number of symbols in recognized 

w prefix 238. the propagation routine is reentered with this prefix. If the last build for this prefix was not an insertion, 
variable n is incremented and the propagation routine reentered. If the last build for this prefix was an insertion, the 
propagation routine is reentered without incremented variable n. The process is repeated until the value of n of the 
highest probability prefix in queue 390 is equal to the number of symbols in recognized prefix 238. This prefix is then 
moved to the top list 292. The next highest probability prefix 294 is then taken from queue 390 for consideration. The 

is confusability algorithm is repeated until queue 390 is exhausted or there are one-hundred and fifty prefixes 293 in top 
list 292. 

Referring again to Fig. 12, the list of prefixes 262 corresponds to the one-hundred and fifty prefixes from top list 
292. The prefixes 262 are then rerecognized by a discrete recognition 295 against the original prefix utterance 230. 
The recognizer returns a distance parameter with each prefix related to how well the prefixes were recognized. At most 

20 the top fifteen to twenty recognized prefixes form the ranked list of prefixes 264. The top prefix 242 from discrete 
recognition 241 of uttered word 232 is combined with prefix list 264. A database of words corresponding to prefixes is 
then consulted to create the list of hypothesized words 240. The ranked list of words 247 is created as described above 
W jth reference to Fig. 10. The algorithm proceeds as described above. 

Referring to Figs. 1 3-1 3b, a few examples of confusability matrixes are shown. Figure 1 3 includes confusion values 

25 for permutations. Fig. 1 3a includes confusion values for insertions, and Fig. 1 3b includes confusion values for deletions. 
The confusability matrices are shown as matrices of probabilities for ease of description: the code utilizes matrices of 
natural logs of the probability. The symbols hyphen, period and comma are not shown. 

The confusion value corresponding to how likely it would be for the recognizer to confuse the recognized symbol 
in the nth digit place with the index value in the nth digit place is determined in the context of the symbol in the preceding 

30 nth digit place (silence if n=1). The rows of the matrix correspond to the symbol context, the columns of the matrix 
correspond to the index value, and the header symbol 510 corresponds to the recognized symbol. There are 39 such 
matrixes, one for each symbol, for permutation, insertion and deletion for a total of 117 matrixes or 39 three dimensional 
matrixes. 

For example, in the case of a permutation, the probability of confusing the recognized letter B with the index value 
35 D in the context of silence is 0.011988 and the probability of confusing the recognized letter B with the index value e 
in the context of a is 0.01 7751 . The probability of confusing the recognized letter O with the index value 4 in the context 
of t is 0.000253. As the number of digit places increases, the probability is calculated by multiplying the probabilities 
for each digit place. For example, the probability of confusing the recognized prefix BO with TA is 0.001 323 x 0.001 831 
= 0.000002422. The confusability matrices are experimentally determined as described below. 

40 

EXAMPLE 

Uttered word: Dosage 
List of recognized word: 
45 Postage. Post, Postal, Buffalo, Dosing 

Spelled prefix: DOS 
Prefix recognized: BOS 
Confusability algorithm: 

so Referring to Fig. 14 in which the calculation of the confusion values is shown for each digit place (only those 

symbols having confusion values greater than 0.01 are shown for simplicity, generally the cut-off value is 0.000010). 
In the first digit place, n=1, in the context of silence, the valid prefixes are put in queue 390: 



55 



n 


operation 


prefix 


confusion value 


1 


P 


B 


0.655987 


1 


D 




0.072822 
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(continued) 



n 


operation 


prefix 


confusion value 


1 


P 


D 


0.011988 


1 


P 


E 


0.011813 


1 


I 


c 


0 010596 


<P = 


: permutation. D - deletion. I = insertion) 



P B is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior operation 
was not an insertion so n is incremented: the propagation routine is reentered. Queue 390 becomes: 



15 



20 



25 



n 


operation 


prefix 


confusion value 


2 


P 


BO 


0.405482 


2 


0 


B 


0.095995 


1 


D 




0.072822 


2 


I 


BB 


0.055001 


2 


I 


BO 


0.050121 


2 


P 


BB 


0.015694 


1 


P 


D 


0.011988 


1 


P 


E 


0.011813 


1 


I 


E 


0.010596 


2 


P 


BW 


0.010545 



p BO is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior operation 
was not an insertion so n is incremented: the propagation routine is reentered. Queue 390 becomes (only those possible 
prefixes having confusion values greater than 0.002 are shown for clarity): 



30 



35 



40 



45 



n 


operation 


prefix 


confusion value 


3 


P 


BOS 


0.216691 


2 


D 


B 


0.095995 


1 


D 




0.072822 


3 


I 


BOO 


0.064946 


3 


D 


BO 


0.059754 


2 


I 


BB 


0.055001 


2 


I 


BO 


0.050121 


3 


I 


BOS 


0.017041 


3 


P 


BOO 


0.016986 


2 


P 


BB . 


0.015694 


1 


P 


D 


0.011988 


1 


P 


E 


0.011813 


1 


I 


E 


0.010596 


2 


P 


BW 


0.010545 


3 


P 


BOA 


0.004753 



p BOS is pulled from queue 390 as the highest probability prefix in the queue: n equals 3 so BOS is moved to top 
list 292: D B is pulled from queue 390 as the highest probability prefix in the queue, n is less than 3 and the prior operation 
was not an insertion so n is incremented: the propagation routine is reentered. Queue 390 becomes: 



55 
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n operation prefix confusio n value 

! d 0.072822 

5 3 1 BOO 0.064946 

3 d BO 0.059754 

2 1 BB 0.055001 

2 1 BO 0.050121 

3 i BOS 0.017041 
10 3 P BOO 0.016986 

2 p BB 0.015694 

IP D 0.011988 

IP E 0.011813 

15 

II E 0.010596 

2 P BW 0.010545 

3D B 0.009121 

20 3 P BOA 0.004753 



0 is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior operation 
was not an insertion so n is incremented: the propagation routine is reentered. Queue 390 becomes: 
25 



30 



35 



40 



n 


operation 


prefix 


confusion value 


3 


I 


BOO 


0.064946 


2 


P 


O 


0.063608 


3 


D 


BO 


0.059754 


2 


I 


BB 


0.055001 


2 


I 


BO 


0.050121 


3 


I 


BOS 


0.017041 


3 


P 


BOO 


0.016986 


2 


P 


BB 


0.015694 


1 


P 


D 


0.011988 


1 


P 


E 


0.011813 


1 


I 


E 


0.010596 


2 


P 


BW 


0.010545 


3 


D 


B 


0.009121 


2 


-D 




0.006324 


3 


P 


BOA 


0.004753 



•BOO is pulled from queue 390 as the highest probability prefix in the queue: n equals 3 so BOO is moved to top 
list 292: p O is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior 
operation was not an insertion so n is incremented: the propagation routine is reentered. Queue 390 becomes: 



so 



55 
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n 



10 



operation prefix confusion value 



3D BO 0.059754 

2 1 BB 0.055001 

2 1 BO 0.050121 

3 p OS 0.034306 
3 I BOS 0.017041 
3 p BOO 0.016986 



is 



20 



2 


P 


BB 


0.015694 


1 


P 


O 


0.011988 


1 


P 


E 


0.011813 


1 


I 


E 


0.010596 


2 


P 


BW 


0.010545 


3 


I 


OO 


0.010188 


3 


D 


O 


0.009374 


2 


D 


B 


0.009121 


2 


D 




0.006324 


3 


P 


BOA 


0.004753 


3 


I 


OS 


0.002673 


3 


P 


OO 


0.002665 



25 

D BO is pulled from queue 390 as the highest probability prefix in the queue: n equals 3 so BO is moved to top list 
292: 'BB is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior operation 
was an insertion so n is not incremented: the propagation routine is reentered and does not yield any valid prefixes. 'BO 
is pulled from queue 390 as the highest probability prefix in the queue: n is less than 3 and the prior operation was an 
30 insertion so n is not incremented: the propagation routine is reentered. Queue 390 becomes: 



35 



40 



45 



n 


operation 


prefix 


confusion value 


3 


P 


OS 


0.034306 


2 


P 


BOO 


0.028937 


3 


I 


BOS 


0.017041 


3 


P 


BOO 


0.016986 


2 


P 


BB 


0.015694 


1 


P 


D 


0.011988 


1 


P 


E 


0.011813 


1 . 


I 


E 


0.010596 


2 


P 


BW 


0.010545 


3 


I 


OO 


0.010188 


3 


D 


O 


0.009374 


2 


D 


B 


0.009121 


2 


I 


BOO 


0.008720 


2 


D 




0.006324 


3 


P 


BOA 


0.004753 


3 


I 


OS 


0.002673 


3 


P 


OO 


0.002665 



p OS is pulled from queue 390 and moved to top list 292: p BOO is pulled from queue 390 as the highest probability 
prefix in the queue; n is less than 3 and the prior operation was not an insertion so n is incremented: the propagation 
routine is reentered. Queue 390 becomes: 
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n 


operation 


prefix 


confusion value 


3 


I 


BOS 


0.017041 


3 


P 


BOO 


0.016986 


2 


P 


BB 


0.015694 


3 


P 


BOOS 


0.015607 


1 


P 


D 


0.011968 


1 


P 


E 


0.011813 


1 


I 


E 


0.010596 


2 


P 


BW 


0.010545 


3 


I 


OO 


0.010188 


3 


D 


O 


0.009374 


2 


D 


B 


0.009121 


2 


I 


BOO 


0.008720 


2 


D 




0.006324 


3 


P 


BOA 


0.004753 


3 


I 


BOOO 


0.004635 


3 


D 


BOO 


0.004264 


3 


I 


OS 


0.002673 


3 


P 


OO 


0.002665 



'BOS is pulled from queue 390 as the highest probability prefix in the queue: n equals 3 but this is a repeat lowered 
value prefix so it is dropped. p BOO is pulled from queue 390 and similarly dropped. Fig. 14 follows the propagation 
routine through p DO. The process is continued until queue 390 is exhausted or there are one hundred and fifty prefixes 
in top list 292. The top 1 3 prefixes are shown in Fig. 15. 

Discrete rerecognition of the top one hundred and fifty prefixes against the uttered prefix 230 yields three prefixes 
in ranked order. If the top recognized prefix 242 from the original discrete recognition of the uttered word (POS) is 
already included in ranked list 264 it is not added. Corresponding words 240 are hypothesized: 



prefixes 


hypothesized words 240 


BOS 
POS 
DOS 
dosimeter 


boscage, boskage, bosh, bosk, etc. 
posada. pose, Poseidon, poser etc. 
dos. dosage, dose, dosido, 
doss, dossal, dossier, dost 



Ranked word list 247 is formed from discrete rerecognition against the uttered word 232: 

. . postage 
dosage 
boscage 
boskage 
dossal 
dossier 
poser 

The distance parameters associated with prefixes 264 are then added to the distance parameters of words 247 that 
include corresponding prefixes and ranked list 247 is reranked according to the new distance parameters. The reranked 
list is then displayed to the user for selection. 

Referring to Fig. 17, a user interface 700 permitting a user to select from ranked word list 247 is shown. Here, the 
word was disco, the recognizer recognized fiscal, the user spelled DIS, and disco appeared as choice 9 in list 247. 
Selecting 9 on a keyboard (not shown) replaces fiscal with disco. 

The conf usability matrices are experimentally determined by having a user spell predetermined words to the speech 
recognizer and comparing the recognized spelling to the known spelling. Referring to Figs. l6-16a t the training algo- 
rithm prompts the user to spell BED. The user spells BED and the recognizer recognizes BET Baum Welsh training 



14 



EP 0 762 385 A2 



in the context of Hidden Markov Models is a standard technique in speech recognition. See. for example. Rabiner. L 

R. : and Juang, B.H., "Introduction to Hidden Markov Models," IEEE ASSR pp. 4-1 6, January 1986. The invention 

utilizes a modification of Baum Welsh training to determine the probability of permutations, insertions and deletions 

for each symbol in the context of each symbol. 
s Permutations build diagonally, deletions horizontally and insertions vertically. Permutations feed all three nodal 

points, deletions feed permutation and deletion nodal points : and insertions feed permutation and insertion nodal points. 

The points in each nodal point are arranged as (insertion, permutation, deletion). Permutations have an additional 

factor of 10 if the letter is correct and a factor of 2 if the letter is not correct. In the forward alpha direction of Fig. 16. 

the silence node starts with values of (1. 1. 1) for (insertion, permutation, aeietion) respectively. All other nodal points 
io start with values of (0. 0. 0). In the backward beta direction of Fig. 16a, the lower right hand node starts with values of 

(1 . 1 ; 1 ) and all other nodal points start with values of (0. 0. 0). The forward and backward diagrams are used to calculate 

confusability probabilities as follows: 



15 



20 



25 



30 



node 


context 


prompt 


recoa . 


aloha 


beta 


ins 


Sil 


B 


B 


1 


22 


per 


Sil 


B 


B 


1 


24 


del 


Sil 


B 




1 


22 


ins 


B 


E 


E 


10 


2 


per 


B 


E 


E 


10 


2 


del 


B 


E 




10 


2 


ins 


B 


E 


B 


1 


2 


per 


B 


E 


B 


1 


4 


del 


B 


E 




0 


4 


ins 


sil 


B 


T 


0 


1 


per 


Sil 


B 


T 


1 


1 


del 


Sil 


B 




1 


0 



(1/260*22/260) *10 

(1/260*24/260) 

(1/260*22/260) 

(10/260*2/260) *10 

(10/260*2/260) 

(10/260*2/260) 

(1/260*2/260) 
(1/260*4/260) 
(0/260*4/260) 

(0/260*1/260) 
(1/260*1/260) 
(1/260*0/260) 



3S 



ins 


E 


D 


per 


E 


D 


del 


E 


D 



E 12 O 

E 12 1 

2 1 



(12/260*0/260) 
(12/260*1/260) 
(2/260*1/260) 



Repetitive probabilities for the same node (operation), context, prompt and recognized symbol are added to de- 
termine the probability of confusing the recognized symbol with the prompt. This procedure is carried out until the 
proportions between the confusions becomes more or less stable indicated that a majority of confusion phenomena 

-*5 have been captured. Anomalies in the data can be smoothed by looking at related confusions, i.e., the probability of 
confusing e with a in the context of b is related to the probability of confusing a with e in the context of b. It is useful to 
have several speakers participate in the training process to capture a large variety of possible confusion errors. The 
resulting matrix may be customized for a particular user. 

Speech recognition in accordance with the invention can be performed whenever there are two or more utterances 

so that include related data, e.g., knowing the city and state the zip code is known, knowing the spelling of a word the 
word is known. Other examples of related data are inventory name and number, flight origination/destination and flight 
number. A database containing the relationships between the data is consulted to hypothesize one set of data from 
the other. Referring to Fig. 9, a first utterance 150 is recognized 151 by continuous recognition. A confusability algorithm 
153 creates a ranked list including entries that the recognizer would be likely to confuse with the recognized utterance 

55 1 52 and hypothesizes a list of second utterances 155 by consulting a database 154. 

A discrete recognizer 1 57 recognizes a second utterance 1 56 (second utterance 1 56 contains data redundant with 
first utterance 1 50) by recognizing the utterance against a database 1 58 and creates a list of possible second utterances 
159. The hypothesized list of second utterances 155 and the list of possible second utterances 159 are merged 160 
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to form a second utterance list 161. Second utterance list 161 is discretely rerecognized 162 against the second ut- 
terance 156 and a ranked list of second utterances 163 is formed. Ranked list of second utterances 163 is compared 
164 to hypothesized second utterances 155 : the top ranked second utterance being compared first. The first ranked 
second utterance that shows up in the hypothesized second utterances 155 is chosen 165. 

5 If the choice is incorrect or the ranked list is exhausted without finding commonality, possible first utterances are 

hypothesized 166 from the ranked list of second utterances 163 by consulting a database 167 to form a list of hypoth- 
esized first utterances 168. The recognized first utterance 152 is merged 169 with the hypothesized first utterances 
168 and the resulting first utterance list 179 is discretely rerecognized 171 against the original first utterance 150 to 
form a ranked list of first utterances 172. The second first utterance on the ranked list is evaluated 173. 

w The speech recognizer is able to inform the user of bad data. i.e.. no commonality was found between ranked list 

of second utterances 163 and hypothesized second utterances 155. and the second first utterance on the ranked list 
172 has a distance parameter that is beyond the acceptable limit. Appendix A is the source code in C++, including 
header files, for the city/state/zip sort application. Appendix B includes additional source code in C++, including header 
files, for the prefix/word application. As compared to the city/state/zip application, in the prefix/word application, the 

is hypo.h. hashalfa.h and hypo.cpp files have been modified, choice. h has replaced wapp.h and choice.cpp has replaced 
wapp.cpp. and trie.h and trie.cpp have been added. 

Other embodiments are within the scope of the following claims. The first and second utterances can be two parts 
of one utterance. The recognition system can be used with an optical character recognizer instead of a speech recog- 
nizer. 
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Claims 

1. A method of speech recognition including: 

5 recognizing a first utterance. 

recognizing a second utterance having information that is related to said first utterance, and 
determining the most probable first and second utterances based on stored information about valid relation- 
ships between possible first and second utterances. 

io 2. The method of claim 1 further including determining validity of one of the recognized utterances and including an 
invalid utterance in a list of possible utterances for comparison with possible utterances in said list. 

3. The method of claim 1 wherein determining the most probable utterances includes ^recognition of one of said 
utterances against a list of possible utterances. 

75 

4. The method of claim 3 further including ranking said list of possible utterances based upon how closely each of 
the possible utterances corresponds to said one of said utterances. 

5. The method of claim 1 further including creating an hypothesized list of possible utterances that relate to at least 
20 one of said recognized utterances based on said stored information. 

6. The method of claim 5 further including ranking a list of possible utterances based upon how closely each of the 
possible utterances corresponds to one of said utterances and comparing said ranked list of possible utterances 
to said hypothesized list of possible utterances for commonality between said lists. 

25 

7. The method of claim 1 further including creating an hypothesized list of possible utterances that relate to at least 
one of said recognized utterances based on said stored information. 

8. The method of claim 1 further including creating a list of possible utterances that could be confused with at least 
30 one of said recognized utterances. 

9. The method of claim 1 wherein said recognized first utterance is recognized continuously. 

10. The method of claim 1 wherein said recognized second utterance is recognized discretely. 

35 

11. The method of claim 9 further including creating a list of possible first utterances that may be confused with said 
recognized first utterance. 

12. The method of claim 11 further including creating an hypothesized list of possible second utterances that relate to 
40 said possible first utterances based on said stored information. 

13. The method of claim 12 further including adding said recognized second utterance to said hypothesized list of 
possible second utterances to create a merged list of possible second utterances. 

45 1 4. The method of claim 1 3 further including rerecognizing said merged list of possible second utterances against said 
second utterance to get a ranked list of possible second utterances, said ranking based upon how closely each 
possible second utterance in said merged list corresponds to said second utterance. 

15. The method of claim 1 4 further including comparing said ranked list of possible second utterances to said hypoth- 
so esized list of possible second utterances for commonality between said lists, said highest ranked possible second 

utterance in said ranked list being compared first. 

16. The method of claim 1 5 further including creating an hypothesized list of possible first utterances from said second 
recognized utterance based on said stored information. 



55 



17. The method of claim 16 further including adding said recognized first utterance to said hypothesized list of possible 
first utterances to create a merged list of possible first utterances. 
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18. The method of claim 17 further including rerecognizing said merged list of possible first utterances against said 
first utterance to get a ranked list of possible first utterances having a ranking based upon how closely each possible 
first utterance in said merged list corresponds to said first utterance. 



5 19. The method of claim 18 further including evaluating a possible first utterance ranked second in said ranked list of 
possible first utterances by determining whether a distance parameter associated with said second ranked possible 
first utterance is within an acceptable limit. 



20. The method of claim 19 further including indicating to a user when said second ranked possible first utterance is 
JO not within said acceptable limit and no commonality exists between said ranked list of possible second utterances 

and said hypothesized list of possible second utterances. 



21. The method of claim 1 wherein said first utterance comprises a zipstate and said second utterance comprises a 
city from a destination address on a package, said determination of the most probable first and second utterances 
'5 resulting in the sorting of said package according to the package's destination address. 



22. The method of claim 1 wherein said first utterance comprises a word and said second utterance comprises spelled 
prefix including ordered symbols. 

20 23. The method of claim 22 further including creating a list of possible prefixes that could be confused with said rec- 
ognized prefix. 

24. The method of claim 23 wherein creating said list of possible prefixes includes determining, in the context of a 
preceding symbol or silence, a probability of confusing each recognized symbol in said prefix with each symbol in 

2S a list of possible symbols. 

25. The method of claim 23 wherein creating said list of possible prefixes includes determining, in the context of a 
preceding symbol or silence, a probability of confusing each recognized symbol in said prefix with more than one 
symbol. 

30 

26. The method of claim 23 wherein creating said list of possible prefixes includes determining, in the context of a 
preceding symbol or silence, a probability of confusing each recognized symbol in said prefix with an absence of 
a symbol. 

35 27, The method of claim 23 wherein creating said list of possible prefixes includes replacing a sequence of symbols 
with a single symbol. 



28. The method of claim 22 wherein said first utterance comprises a spelled word and said determination of the most 
probable first and second utterances results in recognizing said spelled word. 

40 

29. A method of generating a choice list from a continuously recognized utterance comprising: 

recognizing a spoken utterance, 

consulting stored information to determine the probability of confusing possible utterances in the stored infor- 
ms mation with said recognized utterance, and 

producing a list of possible utterances from the stored information that could be confused with the recognized 
utterance. 



30. The method of claim 29 further including rerecognizing said utterance against a merged list of said list of possible 
50 utterances and said recognized utterance to create a ranked list of possible utterances having a ranking based 

upon how closely each utterance in said merged list corresponds to said spoken utterance. 

31. A method of recognizing ambiguous inputs including: 



55 recognizing a first ambiguous input, 

recognizing a second ambiguous input having information that is related to said first ambiguous input, and 
determining the most probable first and second ambiguous inputs based on stored information about valid 
relationships between possible first and second ambiguous inputs. 
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32. A method of training a speech recognizer, comprising: 

prompting a user to make a first utterance comprising symbols, 
recognizing said symbols, and 
5 calculating the probability of confusing each recognized symbol with the prompted symbol. 

33. The method of claim 32 wherein said probabilities are calculated within the context of the preceding symbol or 
silence. 

io 34. A method of displaying word choices during speech recognition, comprising: 
recognizing an uttered word, 

recognizing a spelling of a prefix of the word, whereby symbols are used to spell said prefix, and 
displaying a list of word choices on a screen for selection, a top choice on the list corresponding to a highest 
is ranked choice. 

35. The method of claim 34 wherein said symbols comprise letters, digits, and punctuation. 

36. The method of claim 29 wherein said spoken work is recognized continuously. 

20 
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FIG. 4a 
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